reference location
Gaussian Process Assisted Meta-learning for Image Classification and Object Detection Models
Flowers, Anna R., Franck, Christopher T., Gramacy, Robert B., Krometis, Justin A.
Collecting operationally realistic data to inform machine learning models can be costly. Before collecting new data, it is helpful to understand where a model is deficient. For example, object detectors trained on images of rare objects may not be good at identification in poorly represented conditions. We offer a way of informing subsequent data acquisition to maximize model performance by leveraging the toolkit of computer experiments and metadata describing the circumstances under which the training data was collected (e.g., season, time of day, location). We do this by evaluating the learner as the training data is varied according to its metadata. A Gaussian process (GP) surrogate fit to that response surface can inform new data acquisitions. This meta-learning approach offers improvements to learner performance as compared to data with randomly selected metadata, which we illustrate on both classic learning examples, and on a motivating application involving the collection of aerial images in search of airplanes.
- North America > United States > Virginia (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- (2 more...)
- Government > Regional Government > North America Government > United States Government (1.00)
- Government > Military (0.68)
SoundVista: Novel-View Ambient Sound Synthesis via Visual-Acoustic Binding
Chen, Mingfei, Gebru, Israel D., Ananthabhotla, Ishwarya, Richardt, Christian, Markovic, Dejan, Sandakly, Jake, Krenn, Steven, Keebler, Todd, Shlizerman, Eli, Richard, Alexander
We introduce SoundVista, a method to generate the ambient sound of an arbitrary scene at novel viewpoints. Given a pre-acquired recording of the scene from sparsely distributed microphones, SoundVista can synthesize the sound of that scene from an unseen target viewpoint. The method learns the underlying acoustic transfer function that relates the signals acquired at the distributed microphones to the signal at the target viewpoint, using a limited number of known recordings. Unlike existing works, our method does not require constraints or prior knowledge of sound source details. Moreover, our method efficiently adapts to diverse room layouts, reference microphone configurations and unseen environments. To enable this, we introduce a visual-acoustic binding module that learns visual embeddings linked with local acoustic properties from panoramic RGB and depth data. We first leverage these embeddings to optimize the placement of reference microphones in any given scene. During synthesis, we leverage multiple embeddings extracted from reference locations to get adaptive weights for their contribution, conditioned on target viewpoint. We benchmark the task on both publicly available data and real-world settings. We demonstrate significant improvements over existing methods.
- Asia > Middle East > Israel (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Information Technology > Artificial Intelligence > Vision (0.95)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
- Information Technology > Artificial Intelligence > Natural Language (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
V2V-LLM: Vehicle-to-Vehicle Cooperative Autonomous Driving with Multi-Modal Large Language Models
Chiu, Hsu-kuang, Hachiuma, Ryo, Wang, Chien-Yi, Smith, Stephen F., Wang, Yu-Chiang Frank, Chen, Min-Hung
Current autonomous driving vehicles rely mainly on their individual sensors to understand surrounding scenes and plan for future trajectories, which can be unreliable when the sensors are malfunctioning or occluded. To address this problem, cooperative perception methods via vehicle-to-vehicle (V2V) communication have been proposed, but they have tended to focus on detection and tracking. How those approaches contribute to overall cooperative planning performance is still under-explored. Inspired by recent progress using Large Language Models (LLMs) to build autonomous driving systems, we propose a novel problem setting that integrates an LLM into cooperative autonomous driving, with the proposed Vehicle-to-Vehicle Question-Answering (V2V-QA) dataset and benchmark. We also propose our baseline method Vehicle-to-Vehicle Large Language Model (V2V-LLM), which uses an LLM to fuse perception information from multiple connected autonomous vehicles (CAVs) and answer driving-related questions: grounding, notable object identification, and planning. Experimental results show that our proposed V2V-LLM can be a promising unified model architecture for performing various tasks in cooperative autonomous driving, and outperforms other baseline methods that use different fusion approaches. Our work also creates a new research direction that can improve the safety of future autonomous driving systems. Our project website: https://eddyhkchiu.github.io/v2vllm.github.io/ .
- Transportation > Ground > Road (1.00)
- Information Technology > Robotics & Automation (1.00)
- Automobiles & Trucks (1.00)
Handling Device Heterogeneity for Deep Learning-based Localization
Shokry, Ahmed, Youssef, Moustafa
Deep learning-based fingerprinting is one of the current promising technologies for outdoor localization in cellular networks. However, deploying such localization systems for heterogeneous phones affects their accuracy as the cellular received signal strength (RSS) readings vary for different types of phones. In this paper, we introduce a number of techniques for addressing the phones heterogeneity problem in the deep-learning based localization systems. The basic idea is either to approximate a function that maps the cellular RSS measurements between different devices or to transfer the knowledge across them. Evaluation of the proposed techniques using different Android phones on four independent testbeds shows that our techniques can improve the localization accuracy by more than 220% for the four testbeds as compared to the state-of-the-art systems. This highlights the promise of the proposed device heterogeneity handling techniques for enabling a wide deployment of deep learning-based localization systems over different devices.
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.06)
- Africa > Middle East > Egypt > Cairo Governorate > Cairo (0.05)
- North America > United States > Pennsylvania (0.04)
- (2 more...)
Model-based learning for multi-antenna multi-frequency location-to-channel mapping
Chatelier, Baptiste, Corlay, Vincent, Crussière, Matthieu, Magoarou, Luc Le
Years of study of the propagation channel showed a close relation between a location and the associated communication channel response. The use of a neural network to learn the location-to-channel mapping can therefore be envisioned. The Implicit Neural Representation (INR) literature showed that classical neural architecture are biased towards learning low-frequency content, making the location-to-channel mapping learning a non-trivial problem. Indeed, it is well known that this mapping is a function rapidly varying with the location, on the order of the wavelength. This paper leverages the model-based machine learning paradigm to derive a problem-specific neural architecture from a propagation channel model. The resulting architecture efficiently overcomes the spectral-bias issue. It only learns low-frequency sparse correction terms activating a dictionary of high-frequency components. The proposed architecture is evaluated against classical INR architectures on realistic synthetic data, showing much better accuracy. Its mapping learning performance is explained based on the approximated channel model, highlighting the explainability of the model-based machine learning paradigm.
- Europe > France > Brittany > Ille-et-Vilaine > Rennes (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
A General 3D Road Model for Motorcycle Racing
Fork, Thomas, Borrelli, Francesco
Abstract--We present a novel control-oriented motorcycle model and use it for computing racing lines on a nonplanar racetrack. The proposed model combines recent advances in nonplanar road models with the dynamics of motorcycles. Our approach considers the additional camber degree of freedom of the motorcycle body with a simplified model of the rider and front steering fork bodies. We demonstrate the effectiveness of our model by computing minimum-time racing trajectories on a nonplanar racetrack. Control-oriented vehicle models have seen widespread use for trajectory planning in consumer [1, 2] and motorsport [3, 4] applications.
- North America > United States > California > Alameda County > Berkeley (0.04)
- Europe > Switzerland (0.04)
- Transportation > Passenger (1.00)
- Transportation > Ground > Road (1.00)
- Leisure & Entertainment > Sports > Motorsports > Motorcycle Racing (0.40)
Automatic Search for Photoacoustic Marker Using Automated Transrectal Ultrasound
Wu, Zijian, Moradi, Hamid, Yang, Shuojue, Song, Hyunwoo, Boctor, Emad M., Salcudean, Septimiu E.
According to [2], 11.6% of men will develop prostate cancer in their lifetime, with approximately a 20% death rate in the United States. Radical prostatectomy is a popular surgical approach to treat PCa by removing the entire prostate gland since 1905 [3,4]. In clinical practice, the traditional open radical prostatectomy (ORP) has almost been replaced by laparoscopic radical prostatectomy (RLP) [5]. As a minimally invasive surgical procedure for PCa, RLP significantly reduces blood loss, hospitalization duration, and postoperative complications [6]. However, the long learning curve associated with laparoscopic procedures limits the application of RLP [7]. Robot-assisted laparoscopic prostatectomy (RALP) has been demonstrated [5] to shorten this learning curve by leveraging the wristed instruments and the 3-D endoscopic camera of the telerobotic surgical system, usually the da Vinci surgical system, to achieve intuitive operation [8]. However, the endoscopic camera cannot localize the prostate lesions nor visualize the sub-surface anatomy of the prostate gland. Therefore, a complementary medical imaging modality is necessary to facilitate RALP.
- North America > United States > Texas > Travis County > Austin (0.14)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- North America > United States > Maryland > Baltimore (0.04)
- (6 more...)
- Health & Medicine > Surgery (1.00)
- Health & Medicine > Health Care Technology (1.00)
- Health & Medicine > Therapeutic Area > Urology (0.89)
- Health & Medicine > Therapeutic Area > Oncology (0.67)